Goto

Collaborating Authors

 window shift


Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

arXiv.org Machine Learning

A key factor that determines the usability of applications based on speech recognition is the latency or lag of the system. In dialogue systems, e.g., long latencies may disrupt the natural turntaking in the human-machine conversation. In other specific applications the lag may even be more critical. A typical example involves systems that use ASR to drive the lip movements of an avatar in real time to support telepresence [3, 4, 5]. The latency in a typical speech recogniser based on a hybrid between Neural Networks (NNs) and Hidden Markov Models (HMMs) is determined by a number of factors: - the hardware (sound card) introduces some lag in digitising the speech samples and making them available to the drivers. Typical values are in the order of milliseconds; - the speech samples are returned by the driver in buffers of a certain size (this could be as long as half a second, but can be reduced to a few ms); - in spectral based feature extraction, speech samples are grouped into windows (frames) often around 25-40 ms in length; - many methods for feature extraction also compute time derivatives of the features, which require a number of frames in the past and the future.


Achieving Approximate Soft Clustering in Data Streams

arXiv.org Artificial Intelligence

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process data streams. Additionally, real-time requirements and evolving nature of data streams make stream mining problems, including clustering, challenging research problems. In this paper, we propose a one-pass streaming soft clustering (membership of a point in a cluster is described by a distribution) algorithm which approximates the "soft" version of the k-means objective function. Soft clustering has applications in various aspects of databases and machine learning including density estimation and learning mixture models. We first achieve a simple pseudo-approximation in terms of the "hard" k-means algorithm, where the algorithm is allowed to output more than $k$ centers. We convert this batch algorithm to a streaming one (using an extension of the k-means++ algorithm recently proposed) in the "cash register" model. We also extend this algorithm when the clustering is done over a moving window in the data stream.